A Maximum Entropy Tagger with Unsupervised Hidden Markov Models

نویسندگان

  • Jun'ichi Kazama
  • Yusuke Miyao
  • Jun'ichi Tsujii
چکیده

We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximum entropy model. Our method for exploiting unsupervised learning of a probabilistic model can reduce the cost of building taggers with no dictionary and a small annotated corpus. Experimental results on English POS tagging and Japanese word segmentation show that in both tasks our method greatly improves the tagging accuracy when the model is trained with a small annotated corpus. Furthermore, our English POS tagger achieved betterthan-state-of-the-art POS tagging accuracy (96.84%) when a large annotated corpus is available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Part of Speech Tagging

In the area of text mining, Natural Language Processing is an emerging field. As text is an unstructured source of information, to make it a suitable input to an automatic method of information extraction it is usually transformed into a structured format. Part of Speech Tagging is one of the preprocessing steps which perform semantic analysis by assigning one of the parts of speech to the give...

متن کامل

Part of Speech Tagging with Discriminatively Re-ranked Hidden Markov Models

The task of part of speech tagging has been approached by various ways. Originally, constructed by way of hand-crafted rules for disambiguation, the majority of tagging is now accomplished by utilizing statistical machine learning methods. Two commonly applied statistical methods are hidden Markov models (HMM) and an extension of Markov chains combined with a maximum entropy classifier called m...

متن کامل

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We giv...

متن کامل

Computational Analysis of Part of Speech Tagging

In order, to make text a suitable input to an automatic method of information extraction it is usually transformed from unstructured source of information into a structured format. Part of Speech Tagging is one of the preprocessing steps which assign one of the parts of speech to the given word. In this paper we had discussed various models of supervised and unsupervised technique shown the com...

متن کامل

Taylor Expansion for the Entropy Rate of Hidden Markov Chains

We study the entropy rate of a hidden Markov process, defined by observing the output of a symmetric channel whose input is a first order Markov process. Although this definition is very simple, obtaining the exact amount of entropy rate in calculation is an open problem. We introduce some probability matrices based on Markov chain's and channel's parameters. Then, we try to obtain an estimate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001